{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[{"file_id":"1Mi_dSbygxynRsFE1UMSWsvXQxY53KPHe","timestamp":1740535798634}],"collapsed_sections":["xGahAFpaiGyj"]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["This code accesses the Census dataset and creates CSV files with median house value, median household income, and racial demographic data for all census tracts in the New Jersey Mercer County area."],"metadata":{"id":"d5-DhuJgKFH0"}},{"cell_type":"code","source":["!pip install census\n","!pip install us"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Lnix8eZv02Ow","executionInfo":{"status":"ok","timestamp":1742060848185,"user_tz":240,"elapsed":19177,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}},"outputId":"206f500d-b3bd-4508-8bc9-72a886f4fdc2"},"execution_count":1,"outputs":[{"output_type":"stream","name":"stdout","text":["Collecting census\n","  Downloading census-0.8.23-py3-none-any.whl.metadata (8.1 kB)\n","Requirement already satisfied: requests>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from census) (2.32.3)\n","Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests>=1.1.0->census) (3.4.1)\n","Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests>=1.1.0->census) (3.10)\n","Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests>=1.1.0->census) (2.3.0)\n","Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests>=1.1.0->census) (2025.1.31)\n","Downloading census-0.8.23-py3-none-any.whl (11 kB)\n","Installing collected packages: census\n","Successfully installed census-0.8.23\n","Collecting us\n","  Downloading us-3.2.0-py3-none-any.whl.metadata (10 kB)\n","Requirement already satisfied: jellyfish in /usr/local/lib/python3.11/dist-packages (from us) (1.1.0)\n","Downloading us-3.2.0-py3-none-any.whl (13 kB)\n","Installing collected packages: us\n","Successfully installed us-3.2.0\n"]}]},{"cell_type":"code","execution_count":2,"metadata":{"id":"gMWA5kQq0yNC","executionInfo":{"status":"ok","timestamp":1742060851073,"user_tz":240,"elapsed":800,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"outputs":[],"source":["from census import Census\n","from us import states\n","import pandas as pd\n","\n","import requests\n","from requests.adapters import HTTPAdapter\n","from requests.packages.urllib3.util.retry import Retry"]},{"cell_type":"code","source":["from google.colab import drive\n","drive.mount('/content/drive')"],"metadata":{"id":"JaFPrT8s2Edi","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1742060874455,"user_tz":240,"elapsed":22700,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}},"outputId":"5698f3b4-ba5a-42e1-f965-b148034c0ac2"},"execution_count":3,"outputs":[{"output_type":"stream","name":"stdout","text":["Mounted at /content/drive\n"]}]},{"cell_type":"code","source":["# Configure retries for the requests session\n","session = requests.Session()\n","retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[500, 502, 503, 504])\n","session.mount('http://', HTTPAdapter(max_retries=retries))\n","session.mount('https://', HTTPAdapter(max_retries=retries))\n"],"metadata":{"id":"Gg6WNO2iG-s4","executionInfo":{"status":"ok","timestamp":1742060910405,"user_tz":240,"elapsed":13,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":4,"outputs":[]},{"cell_type":"code","source":["# Census API key\n","# request a key from https://api.census.gov/data/key_signup.html\n","CENSUS_API_KEY = \"(API KEY)\"\n","c = Census(CENSUS_API_KEY)"],"metadata":{"id":"XGYG_SRC04s_","executionInfo":{"status":"ok","timestamp":1742060911500,"user_tz":240,"elapsed":12,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":5,"outputs":[]},{"cell_type":"code","source":["from census import Census\n","from us import states\n","import pandas as pd\n","\n","def fetch_census_data_by_field(state_fips, county_fips, valid_years, field, field_name):\n","    \"\"\"\n","    Fetches data for a specific field (e.g., Median Household Income or Median House Value)\n","    for a specific state and county for each year in the valid_years range.\n","\n","    Parameters:\n","    - state_fips: FIPS code for the state (e.g., \"34\" for New Jersey)\n","    - county_fips: FIPS code for the county (e.g., \"021\" for Mercer County)\n","    - valid_years: List or range of years to fetch data for (e.g., range(2012,2023) for 2012 to 2022)\n","    - field: The field code to query (e.g., 'B19013_001E')\n","    - field_name: The name of the field (e.g., 'Median_Household_Income')\n","\n","    Returns:\n","    - data_all: DataFrame with data for the specified field for the specified years\n","    \"\"\"\n","\n","    # Initialize an empty DataFrame to store yearly data\n","    data_all = pd.DataFrame()\n","\n","    # Fetch data for each year\n","    for year in valid_years:\n","        # Fetch data for the given field and year\n","        data = c.acs5.state_county_tract(\n","            state_fips=state_fips,\n","            county_fips=county_fips,\n","            tract=Census.ALL,  # All Census Tracts\n","            year=year,  # Dynamically specify the year\n","            fields=(field,)  # The field to query (e.g., Median Household Income or Median House Value)\n","        )\n","\n","        # Convert the data to a DataFrame\n","        df = pd.DataFrame(data)\n","\n","        # Rename columns for clarity\n","        df.rename(columns={\n","            field: f'{field_name}_{year}',  # Add year to column name\n","            'state': 'State_FIPS',\n","            'county': 'County_FIPS',\n","            'tract': 'Census_Tract'  # Ensure tract is renamed correctly\n","        }, inplace=True)\n","\n","        # Merge yearly data\n","        if data_all.empty:\n","            data_all = df\n","        else:\n","            data_all = pd.merge(data_all, df, on=['State_FIPS', 'County_FIPS', 'Census_Tract'], how='outer')\n","\n","    return data_all\n","\n","\n","\n","# Federal Information Processing Standard codes (FIPS)\n","# Change to your state's or county's corresponding FIPS code\n","state_fips = states.NJ.fips  # New Jersey state FIPS (34)\n","county_fips = \"021\"          # Mercer County FIPS (021)\n","\n","# Define valid years for Census API (2012 to 2022)\n","valid_years = range(2012, 2023)"],"metadata":{"id":"Vy3Nv84V-8fn","executionInfo":{"status":"ok","timestamp":1742060915824,"user_tz":240,"elapsed":5,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":6,"outputs":[]},{"cell_type":"markdown","source":["The U.S. Census Bureau revises geographic boundaries of census tracts every 10 years.\n","\n","Certain census tracts are split into several census tracts in order to better reflect changes in population distribution, income, etc.\n","\n","For example, tract 3302 was split into 3303 and 3304 in the year 2020.\n","\n","As a result, tract 3302 only contains data from 2012-2019, while tracts 3303 and 3304 only contain data from 2020-2022."],"metadata":{"id":"Xo_9bt8ujFON"}},{"cell_type":"code","source":["\"\"\"\n","This function is applied to the median household income and median\n","house value.\n","\n","The new census tracts are updated with the old census tract values for\n","the period provided. The old census tract is then removed from the\n","dataset.\n","\n","The reason for this method is because medians are positional statistics\n","rather than sums, like with populations, so estimating a ratio would not\n","reflect actual values.\n","\n","The Census Bureau also does not offer granular data, so the median can\n","not be calculated for each census tract.\n","\n","\n","The new and old census tract IDs are manually obtained from the Census\n","Bureau.\n","\n","As with the state_fips and county_fips, they should be changed to\n","reflect any changes in boundaries in the state or county.\n","\"\"\"\n","\n","\n","\n","def update_median_census_tracts(final_data, tract_values, columns_to_update, years):\n","    \"\"\"\n","    Update the values of median-based columns for specified census tracts in the\n","    DataFrame using the old census tract data.\n","\n","    Parameters:\n","    - final_data: DataFrame containing the census data.\n","    - tract_values: Dictionary mapping new census tracts to old row values.\n","    - columns_to_update: List of column names to update.\n","    - years: The years for which the columns are updated. --- the first year cannot be before 2010, and the last year cannot be after 2019 (maximum would be range(2010, 2020))\n","\n","    Returns:\n","    - Updated DataFrame with the specified values replaced.\n","    \"\"\"\n","\n","    # Update new census tracts with old census tract values\n","    for new_tracts, old_row in tract_values.items():\n","        for new_tract in new_tracts:  # Iterate over the tuple of new tracts\n","            for column in columns_to_update:\n","                final_data.loc[final_data['Census_Tract'] == new_tract, column] = old_row[column]\n","\n","    # Remove the old census tracts from the DataFrame\n","    old_tracts_to_remove = [old_row['Census_Tract'] for _, old_row in tract_values.items()]\n","    final_data = final_data[~final_data['Census_Tract'].isin(old_tracts_to_remove)]\n","\n","    return final_data"],"metadata":{"id":"dDC-aTWJjAf6","executionInfo":{"status":"ok","timestamp":1742060918393,"user_tz":240,"elapsed":35,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":7,"outputs":[]},{"cell_type":"code","source":["\"\"\"\n","This function is applied to the population statistics.\n","\n","The ratio between the new census tracts for the year 2020 is calculated,\n","after which the old census tract values for the period provided are\n","proportionally distributed to update the new census tracts using this\n","ratio.\n","\n","The old census tract is then removed from the dataset.\n","\n","Since populations are sums, estimating the population distribution in\n","the new census tracts is feasible and able to more accurately reflect\n","actual values than if they were to be copied over from the old census\n","tracts or removed entirely.\n","\n","\n","The new and old census tract IDs are manually obtained from the Census\n","Bureau.\n","\n","As with the state_fips and county_fips, they should be changed to\n","reflect any changes in boundaries in the state or county.\n","\"\"\"\n","\n","\n","\n","def update_sum_census_tracts(final_data, tract_values, columns_to_update, years):\n","    \"\"\"\n","    Update the values for sum-related columns for specified census tracts in the\n","    DataFrame using their ratios from the year 2020 and the old census tract data.\n","\n","    Parameters:\n","    - final_data: DataFrame containing the census data.\n","    - tract_values: Dictionary mapping new census tracts to old row values.\n","    - columns_to_update: List of column names to update.\n","    - years: The years for which the columns are updated. --- the first year cannot be before 2010, and the last year cannot be after 2019 (maximum would be range(2010, 2020))\n","\n","    Returns:\n","    - Updated DataFrame with the specified values replaced.\n","    \"\"\"\n","    # Iterate through new tracts and proportionally distribute old data\n","    for new_tracts, old_row in tract_values.items():\n","        new_rows = [final_data[final_data['Census_Tract'] == tract].iloc[0] for tract in new_tracts]\n","        for column in columns_to_update:\n","            total_2020 = sum(row[column[:-4] + '2020'] for row in new_rows)\n","\n","            # Avoid division by zero\n","            ratios = [row[column[:-4] + '2020'] / total_2020 if total_2020 else 0.5 for row in new_rows] # if total_2020 = 0 (meaning that the new census tracts had a sum population of 0), use a 50/50 ratio\n","\n","            for new_tract, ratio in zip(new_tracts, ratios):\n","                final_data.loc[final_data['Census_Tract'] == new_tract, column] = old_row[column] * ratio\n","\n","    # Remove the old census tracts from the DataFrame\n","    old_tracts_to_remove = [old_row['Census_Tract'] for _, old_row in tract_values.items()]\n","    final_data = final_data[~final_data['Census_Tract'].isin(old_tracts_to_remove)]\n","\n","    return final_data"],"metadata":{"id":"3CWmFSDH0Jzh","executionInfo":{"status":"ok","timestamp":1742060920331,"user_tz":240,"elapsed":25,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":8,"outputs":[]},{"cell_type":"code","source":["def get_row(tract):\n","    \"\"\"\n","    Parameters:\n","    - tract: The census tract for which to retrieve the row.\n","\n","    Returns:\n","    - Row of data for the specified census tract.\n","    \"\"\"\n","\n","    return final_data[final_data['Census_Tract'] == tract].iloc[0]"],"metadata":{"id":"dl-ZmJ-vjQY_","executionInfo":{"status":"ok","timestamp":1742060922601,"user_tz":240,"elapsed":7,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":9,"outputs":[]},{"cell_type":"markdown","source":["# Median household income and median house value"],"metadata":{"id":"s5O5DH8z3O45"}},{"cell_type":"code","source":["# Initialize DataFrames for each variable to store yearly data\n","income_data_all = fetch_census_data_by_field(\n","    state_fips,\n","    county_fips,\n","    valid_years,\n","    field='B19013_001E',\n","    field_name='Median_Household_Income'\n",")  # Fields and subfields obtained from ACS5 Census data\n","\n","house_value_data_all = fetch_census_data_by_field(\n","    state_fips,\n","    county_fips,\n","    valid_years,\n","    field='B25077_001E',\n","    field_name='Median_House_Value'\n",")\n","\n","\n","\n","# Merge income and house value data into a single table\n","final_data = pd.merge(income_data_all, house_value_data_all, on=['State_FIPS', 'County_FIPS', 'Census_Tract'], how='outer')\n","\n","# Generate Unique ID directly in the table\n","final_data['Unique_ID'] = final_data['State_FIPS'] + final_data['County_FIPS'] + final_data['Census_Tract']\n","\n","# Reorder columns to place Unique_ID first\n","cols = ['Unique_ID'] + [col for col in final_data.columns if col != 'Unique_ID']\n","final_data = final_data[cols]\n","\n","desired_column_order = ['Unique_ID', 'State_FIPS', 'County_FIPS', 'Census_Tract']  # Add other core columns\n","\n","# Dynamically add the income and house value columns\n","for year in valid_years:\n","    desired_column_order.extend([f'Median_Household_Income_{year}'])\n","\n","for year in valid_years:\n","    desired_column_order.extend([f'Median_House_Value_{year}'])\n","\n","# Apply the desired column order\n","final_data = final_data[desired_column_order]\n","\n"],"metadata":{"id":"B_wNtW5Ec5Ia","executionInfo":{"status":"ok","timestamp":1742060959818,"user_tz":240,"elapsed":32013,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":10,"outputs":[]},{"cell_type":"code","source":["# Define the rows for each of the old census tracts\n","row_3302 = get_row('003302')\n","row_3705 = get_row('003705')\n","row_4201 = get_row('004201')\n","row_4301 = get_row('004301')\n","row_4304 = get_row('004304')\n","row_4310 = get_row('004310')\n","row_4405 = get_row('004405')\n","\n","# Define the dictionary mapping new census tracts to their corresponding old row\n","tract_values = {\n","    ('003303', '003304'): row_3302,\n","    ('003707', '003708'): row_3705,\n","    ('004205', '004206'): row_4201,\n","    ('004313', '004314'): row_4301,\n","    ('004315', '004316'): row_4304,\n","    ('004311', '004312'): row_4310,\n","    ('004408', '004409'): row_4405\n","}\n","\n","\n","# Call the function to update the census data\n","columns_to_update = [f'Median_Household_Income_{year}' for year in valid_years] + [f'Median_House_Value_{year}' for year in valid_years] # Add other median variables here\n","final_data = update_median_census_tracts(final_data, tract_values, columns_to_update, years=range(2012, 2020))\n","\n","# Additional row filtering - these rows contained only NA values or -666666666 (meaning that the data was hidden for privacy reasons)\n","final_data = final_data[final_data['Census_Tract'] != '002400']\n","final_data = final_data[final_data['Census_Tract'] != '980000']\n","\n","# Save the complete data to the specified path\n","file_path = '/path/to/your/data/' + 'median_household_income_and_median_house_value.csv'  # Replace with actual file directory and name\n","final_data.to_csv(file_path, index=False)\n","\n","# Print success message with file path\n","print(f\"Data successfully saved to {file_path}\")"],"metadata":{"id":"eo9hSvMh4bgH"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["print(final_data['Census_Tract'].unique())  # Print unique values in the 'Census_Tract' column\n","print('3302' in final_data['Census_Tract'].unique())  # Check if '3302' (one of the tracts removed) is present"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"qx4M2irnfQyw","executionInfo":{"status":"ok","timestamp":1739570620560,"user_tz":300,"elapsed":120,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}},"outputId":"e8ddc1e3-08b4-4743-f4ab-39ba5e2f5f7a","collapsed":true},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["['000100' '000200' '000300' '000400' '000500' '000600' '000700' '000800'\n"," '000900' '001000' '001101' '001102' '001200' '001300' '001401' '001402'\n"," '001500' '001600' '001700' '001800' '001900' '002000' '002100' '002200'\n"," '002400' '002500' '002601' '002602' '002701' '002702' '002800' '002902'\n"," '002903' '002904' '003001' '003002' '003003' '003004' '003006' '003007'\n"," '003008' '003009' '003100' '003201' '003202' '003301' '003302' '003303'\n"," '003304' '003400' '003500' '003601' '003602' '003703' '003704' '003705'\n"," '003706' '003707' '003708' '003800' '003902' '003903' '003904' '003905'\n"," '004000' '004201' '004203' '004204' '004205' '004206' '004301' '004304'\n"," '004306' '004307' '004309' '004310' '004311' '004312' '004313' '004314'\n"," '004315' '004316' '004403' '004404' '004405' '004406' '004407' '004408'\n"," '004409' '004501' '004502' '980000']\n","False\n"]}]},{"cell_type":"markdown","source":["# Race Populations"],"metadata":{"id":"DY_3uick0IAg"}},{"cell_type":"code","source":["# Define fields for different population categories\n","fields = {\n","    'Total_Population': 'B02001_001E',\n","    'White_Population': 'B02001_002E',\n","    'Black_Population': 'B02001_003E',\n","    'Native_Population': 'B02001_004E',\n","    'Asian_Population': 'B02001_005E',\n","    'Hawaiian_Population': 'B02001_006E',\n","    'Other_Population': 'B02001_007E',\n","    'Two_Population': 'B02001_008E'\n","}\n","\n","# Fetch data for all fields\n","population_dataframes = {}\n","for field_name, field in fields.items():\n","    population_dataframes[field_name] = fetch_census_data_by_field(\n","        state_fips,\n","        county_fips,\n","        valid_years,\n","        field,\n","        field_name\n","    )\n","\n","\n","# Merge all population data into a single DataFrame\n","final_data = None\n","for df in population_dataframes.values():\n","    if final_data is None:\n","        final_data = df\n","    else:\n","        final_data = pd.merge(final_data, df, on=['State_FIPS', 'County_FIPS', 'Census_Tract'], how='outer')\n","\n","# Generate Unique ID directly in the table\n","final_data['Unique_ID'] = final_data['State_FIPS'] + final_data['County_FIPS'] + final_data['Census_Tract']\n","\n","# Reorder columns to place Unique_ID first\n","cols = ['Unique_ID'] + [col for col in final_data.columns if col != 'Unique_ID']\n","final_data = final_data[cols]\n","\n","desired_column_order = ['Unique_ID', 'State_FIPS', 'County_FIPS', 'Census_Tract']  # Add other core columns\n","\n","# Define desired column order\n","desired_column_order = ['Unique_ID', 'State_FIPS', 'County_FIPS', 'Census_Tract']\n","for field_name in fields.keys():\n","    for year in valid_years:\n","        desired_column_order.append(f'{field_name}_{year}')\n","\n","# Apply the desired column order\n","final_data = final_data[desired_column_order]"],"metadata":{"id":"TwQ5sUAq0Jj3","executionInfo":{"status":"ok","timestamp":1742061221462,"user_tz":240,"elapsed":101214,"user":{"displayName":"Kingston Li","userId":"00090777972792071524"}}},"execution_count":13,"outputs":[]},{"cell_type":"code","source":["# Define the rows for each of the old census tracts\n","row_3302 = get_row('003302')\n","row_3705 = get_row('003705')\n","row_4201 = get_row('004201')\n","row_4301 = get_row('004301')\n","row_4304 = get_row('004304')\n","row_4310 = get_row('004310')\n","row_4405 = get_row('004405')\n","row_2400 = get_row('002400')\n","\n","# Define the dictionary mapping new census tracts to their corresponding old row\n","tract_values = {\n","    ('003303', '003304'): row_3302,\n","    ('003707', '003708'): row_3705,\n","    ('004205', '004206'): row_4201,\n","    ('004313', '004314'): row_4301,\n","    ('004315', '004316'): row_4304,\n","    ('004311', '004312'): row_4310,\n","    ('004408', '004409'): row_4405,\n","    ('980000',): row_2400 # this row only changed in number, did not split\n","}\n","\n","\n","# Define the years for which you want to replace values\n","years = range(2012, 2020) # 2012-2019 are changed\n","\n","# Define the columns to be updated\n","population_types = ['Total', 'White', 'Black', 'Native', 'Asian', 'Hawaiian', 'Other', 'Two']\n","columns_to_update = [f'{pop}_Population_{year}' for pop in population_types for year in valid_years]\n","\n","# Call the function to update the census data\n","final_data = update_sum_census_tracts(final_data, tract_values, columns_to_update, years=range(2012, 2020))\n","\n","\n","# Save the complete data to the specified path\n","file_path = '/path/to/your/data/' + 'race_populations.csv' # Replace with actual file directory and name\n","final_data.to_csv(file_path, index=False)\n","\n","# Print success message with file path\n","print(f\"Data successfully saved to {file_path}\")"],"metadata":{"id":"6npJNG4nhRV1"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["race_data = pd.read_csv('/path/to/your/data/race_populations.csv')\n","percentage_data = race_data.copy()  # Start with a copy of the original data\n","\n","valid_years = range(2012, 2023)\n","\n","for year in valid_years:\n","    # Calculate percentages for each race category\n","    percentage_data[f'White_Population_{year}'] = (race_data[f'White_Population_{year}'] / race_data[f'Total_Population_{year}']) * 100\n","    percentage_data[f'Black_Population_{year}'] = (race_data[f'Black_Population_{year}'] / race_data[f'Total_Population_{year}']) * 100\n","    percentage_data[f'Native_Population_{year}'] = (race_data[f'Native_Population_{year}'] / race_data[f'Total_Population_{year}']) * 100\n","    percentage_data[f'Asian_Population_{year}'] = (race_data[f'Asian_Population_{year}'] / race_data[f'Total_Population_{year}']) * 100\n","    percentage_data[f'Hawaiian_Population_{year}'] = (race_data[f'Hawaiian_Population_{year}'] / race_data[f'Total_Population_{year}']) * 100\n","    percentage_data[f'Other_Population_{year}'] = (race_data[f'Other_Population_{year}'] / race_data[f'Total_Population_{year}']) * 100\n","    percentage_data[f'Two_Population_{year}'] = (race_data[f'Two_Population_{year}'] / race_data[f'Total_Population_{year}']) * 100\n","\n","\n","file_path = '/path/to/your/data/' + 'race_populations_percentages.csv' # Replace with actual file directory and name\n","percentage_data.to_csv(file_path, index=False)\n","print(f\"Percentage data successfully saved to {file_path}\")"],"metadata":{"id":"euxHqjjg8KP5"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["# Other Variables"],"metadata":{"id":"xGahAFpaiGyj"}},{"cell_type":"markdown","source":["These are other variables that could be taken into consideration for their impacts on house value. Note that not all variables here are recorded in the U.S. Census Bureau's American Community Survey, and other methods of cleaning the data will be necessary for other types of statistics, like means.\n","- Average household size\n","-School quality\n","-Age of residents\n","-Income\n","-Family size\n","-Race\n","-Crime rates\n","-Policies\n","-Distance from economic centers or malls\n","-Distance from highways and train stations, public transport etc.\n","-Interest rates\n","-Unemployment rates\n","-Inflation rates\n","-Air quality, noise pollution\n","-New intrastructure construction\n","-Languages spoken\n","-Migration patterns\n","-New businesses/innovation\n","-Number of gas stations\n","-Number of fast food restaurants"],"metadata":{"id":"vuRiocjriI55"}}]}